Errors in quantum tomography: diagnosing systematic versus statistical errors 
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A prime goal of quantum tomography is to provide quantitatively rigorous characterisation of 
quantum systems, be they states, processes or measurements, particularly for the purposes of 
trouble-shooting and benchmarking experiments in quantum information science. A range of tech- 
niques exist to enable the calculation of errors, such as Monte-Carlo simulations, but their quan- 
titative value is arguably fundamentally flawed without an equally rigorous way of authenticating 
the quality of a reconstruction to ensure it provides a reasonable representation of the data, given 
the known noise sources. A key motivation for developing such a tool is to enable experimentalists 
to rigorously diagnose the presence of technical noise in their tomographic data. In this work, I 
O ' explore the performance of the chi-squared goodness-of-fit test statistic as a measure of reconstruc- 

CD , tion quality. I show that its behaviour deviates noticeably from expectations for states lying near 

' the boundaries of physical state space, severely undermining its usefulness as a quantitative tool 

precisely in the region which is of most interest in quantum information processing tasks. I sug- 
gest a simple, heuristic approach to compensate for these effects and present numerical simulations 
showing that this approach provides substantially improved performance. 



(N 

o 

O 

CD 

Q 



(N 



C| ' PACS numbers: 03.67.-a, 03.65.Wj, 03.67.Mn, 42.50.Dv 

One of the greatest challenges associated with trying to demonstrate a quantum information processing (QIP) 
protocol experimentally is to be able to verify and quantify its successful operation. In some cases, such as Bell [1| 
and steering Q tests of nonlocal quantum phenomena and Kochen-Specker tests of noncontextuality Q, this can be 
O" 1 achieved by violating a measurement inequality. In other cases, such as Shor's factoring algorithm [J], the success of 
the protocol can be readily tested after the fact via a simple test of the "correctness" of the outcome or answer. Often 
the answer is not so clear cut @, however, and quantum tomography can be a valuable tool for achieving this goal. 
This might, for example, take the form of measuring the process directly via quantum process tomography (QPT) [f|, 
' or characterising the state at key stages of the protocol via quantum state tomography (QST) 0). 

At its heart, tomography aims to be a quantitative tool for providing information about the system being studied. 
When wishing to use it to quantify the success or performance of a real-world quantum protocol in a precise way, it 
is therefore critical to have a rigorous and comprehensive recipe for dealing with the effects of noise. The first and 
most obvious ingredient of such a recipe is the necessity to have a way to calculate errors either in the tomographic 
reconstruction itself, or in subsequently derived physical parameters. In Bayesian approaches to tomography, such as 
Bayesian mean estimation Q or Kalman filtering reconstruction Q , the method provides errors automatically as part 
of the reconstruction output. Currentlv, the most commonly used method of tomographic reconstruction, however, 
is maximum- likelihood estimation [ToL fill] . The formal output of this approach is a single quantum state or process, 
the one which mathematically maximises the likelihood function, with no intrinsic uncertainty. Maximum-likelihood 
tomography therefore needs to be augmented in some way to allow the experimenter to estimate errors. Perhaps 
■ the most common way to do this is to use Monte-Carlo simulations, although several other methods have also been 
proposed for doing this in recent years (e.g., [l^ - fl^ ]). One feature typical to most methods for determining errors, is 
that it is first necessary to make assumptions about the precise error model or form of noise affecting the individual 
tomographic measurements. But how do we know when this assumption is actually valid? 

This question highlights the fact that calculating error bars is only half the story. Like any data fitting technique, 
a critical component of tomography must be to have a means of establishing whether the fit or model is in any way 
a meaningful quantitative representation of the original data. This aspect has been largely ignored until recently [15l . 
Il6j . but is particularly important in this context. In tomography, the measurements are sufficiently complex and 
conceptually removed from the more abstract theoretical entities which we use to describe the underlying system, 
like the state and process matrices, that looking directly at the raw data often provides virtually no information to 
the experimentalist. The tomographic reconstruction itself is often the most meaningful graphical representation and 
serves as a proxy for the data when interpreting results. Furthermore, when trying to characterise a quantum system 
with a view to determining fault tolerance, the margins are usually so tight that approximate measures are simply 
not good enough. It is therefore critical to know whether the reconstruction is reasonable and how much confidence 
can be placed in any results derived from it . 

Perhaps the most important motivation for being able to assess the quality of a tomographic reconstruction is that, 
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in any real-world context, measurements are affected by multiple sources of error, both fundamental and technical in 
nature. Fundamental noise arises from the fact that quantum mechanical measurements are inherently probabilistic: 
this leads to statistical errors for any measurement with a finite number of system copies. But in practice, any 
experiment will also be affected by technical noise sources, errors that arise from inaccuracies or instabilities in the 
measurement apparatus. To date, most errors reported in QIP experiments consider only fundamental noise sources 
and ignore errors which arise from technical noise. This is reasonably well justified in cases where statistical counting 
errors dominate, such as multiphoton polarisation tomography, which often involves low count rates and high-precision 
measurements. However, it is likely to be much less reasonable in contexts such as superconducting qubit experiments, 
where systematic errors and imprecision in measurement settings can strongly dominate over statistical errors, making 
it a critical open problem to develop methods for incorporating both fundamental and technical noise sources into 
tomographic error calculations. In either case, however, if tomography is to provide meaningful and rigorous analysis, 
it is necessary to justify its assumptions quantitatively. 

A range of statistical tools exists for address ing, such questions (see, e.g., JT3, EH)) under the broader umbrella of 
hypothesis testing, such as likelihood-ratio tests [3, and chi-squared tests. In this paper, I explore the standard chi- 
squared goodness-of-fit test statistic as a measure of tomographic reconstruction quality and examine its performance 
from an operational perspective. I show that, while the statistic performs well for full-rank mixed states, its behaviour 
becomes much more complex for states lying near the boundary of the space of physical states: in this region, making 
such measures quantitatively rigorous requires careful treatment of the constraints imposed by the tomographic 
reconstruction process. In particular, I show that naively assuming that the optimal matrix involves d degrees of 
freedom leads to substantial over-diagnosis of technical noise, severely undermining the quantitative value of this and 
other similar measures. Determining the precise effect of the physicality constraints on goodncss-of-fit and hypothesis 
testing is extremely complex, because of the nature of the convex inequality constraints imposed by positivity. In 
this work, however, I introduce a simple approach to addressing this problem, based on counting the number of free 
parameters in rank-deficient mixed states. Applying this to the context of chi-squared tests, I define two related ways of 
analysing the chi-squared statistic to assess reconstruction quality and demonstrate that they perform substantially 
better in typical experimental situations than the naive approach which simply ignores the effects of physicality 
constraints. Finally, I discuss the usefulness and reliability of the suggested techniques in real- world applications. In 
the context of photonic tomography carried out in the presence of fluctuations in source brightness, I show that they 
are able to diagnose relatively low-level fluctuations, even when they are smaller than the statistical counting noise. I 
also briefly discuss how the chi-squared test statistic provides qualitative evidence for the value of using over-complete 
rather than minimally complete measurement sets. 



In this section, I provide a brief description of a specific simple form of state tomography loosely following the 
treatment of QPT described in Ref. po| . Since there are straightforward correspondences between the different types 
of tomography (e.g., state and process), I will focus here on state tomography. 

Consider the following scenario: we have a source which produces identical copies of a quantum system in the state 
p (dimension d) and we implement a particular quantum state tomography which is defined by a set of informationally 
complete measurement projectors Pj in order to estimate the "true state" p. Performing this tomography will yield 
a vector of measurement values Af = (A/j) which can be used to reconstruct an estimated state, g. Throughout this 
paper, I use calligraphic letters and symbols to indicated measured or estimated values, and the notation that a = (cij) 
is a vector and a = (ajk) is a matrix. Since the true state can be written as a linear combination, 



of some orthonormal basis of dxd matrices, {Oj}, the problem of estimating the quantum state reduces to the problem 
of estimating the d 2 coefficients o = (oj ) . 

The simplest way to estimate the underlying quantum state is to implement an algebraic linear inversion, which 
can be summarised as follows. We first note that the measurement projectors can also be expanded in terms of the 
orthonormal basis: 



A. Maximum likelihood quantum state tomography — the basic framework 
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FIG. 1: Simulations illustrating the effects of noise on linear inversion. A known, nearly pure target state (a) is used to generate 
500 sets of Poissonian-distributed noisy data with a mean photon flux of 100 photons per measurement setting, which are then 
reconstructed using a naive linear inversion (b) and maximum likelihood estimation (c). The linear inversion produces many 
non-physical states which lie outside the Poincare sphere. 

where the coefficients q = {qjk) = (Tr{0\Pj}) can be determined without reference to measurement results. These 
projectors also define expected measurement probabilities in terms of the unknown "true state" , pj = Tr {Pjp}, which, 
via some simple algebra, can be related to the orthonormal coefficients: 

Pj = Tr {P iP } = Tr {p/ p} = £ q* k Tr {o{p} = £ q* k o k . (3) 

k k 

We can also define a measured "probability" in terms of measured values and a potentially unknown normalisation 
parameter, Vj = Afj/77. Making the standard frequentist assumption that this measured frequency is the best estimate 
of the underlying true-state probability, we can then invert Eq. §3§ above to give estimates of the Oj coefficients: 

g = [q*r lr p. (4) 

Technically, the matrix q is invertible if and only if the set of measurement projectors forms a linearly independent or 
minimally informationally complete set. If not, the estimated coefficients can instead be calculated using the Moore- 
Penrose pseudoinverse [2l[. In particular, if the measurement set is over-complete, this then gives the least-squares 
solution to Eq. ((4]). Finally, we reconstruct an estimated density matrix via a modified form of Eq. ((T|): 

d 2 

Q = Y,°i°r ( 5 ) 

Note that, if the normalisation parameter, 77, is unknown, as is commonly the case in photonic tomography, for 
example, it is straightforwardly calculated at this stage as part of the tomographic reconstruction as follows. Firstly, 
from the definition of Pj, note that AfO = [q*] Af. Then, defining a new, unnormalised reconstructed matrix, g, by 
applying the inversion matrix directly to the vector of measured counts, it is simple to show that: 

d 2 

Q^T / ^T 1 Af) j O j =77g, (6) 
i=i 

where the trace of g is equal to the unknown normalisation parameter, 77, and the normalised part is just the standard 
reconstructed density matrix, g. 

Equations (|4|) and ([5|) reveal the first problem that arises from noise in the measurements. Because the measured 
values, Afj, and therefore also the estimated coefficients, Oj, necessarily contain noise, even if only as a result of the 
statistics of measurements made with a finite number of system copies, the linear inversion estimate from Eq. ([5]) 
is not necessarily physical. Specifically, while it is possible to ensure that it is Hcrmitian (by choosing all Oj to be 
Hcrmitian), the reconstructed matrix may not be positive. For a single qubit, this corresponds to the state estimate 
lying outside the Bloch (Poincare) sphere [Fig.Q]. 

The typical way to solve this problem is to use maximum likelihood tomography [ToL [TI[ , an optimisation procedure 
which is very similar to common curve-fitting data analysis techniques. Importantly, however, this requires no 
assumptions about the form of the measured state except that it be physical. Instead, the goal is simply to search 
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the entire space of allowed, physical density matrices for the one with the highest probability of generating the 
measurement results. This probability is known as the likelihood function. 

In any fitting procedure, provided the measurements outnumber the fitting parameters, the optimal values of the 
fitting parameters will necessarily deviate slightly from the individual observed data points. This is also true for 
maximum likelihood tomography. In this case, however, imposing physicality constraints provides an additional cause 
for mismatch between the observed data and values predicted from the reconstructed state, even in the case of a 
minimally complete measurement set. 

For a particular measurement record, j\f, the likelihood function is [Toj 

C(jV\p) cx JJpf = J] Tr {P jP }"> , (7) 

3 3 

omitting prefactors that depend only on the particular measurement record and therefore do not influence the outcome 
of the optimisation over physical states. Note that, rather than maximising the likelihood function, it is actually more 
common to carry out an equivalent minimisation of the negative log-likelihood penalty function: 

n(Af, p) = - \ogC(N\p) = - J2^3 log(Tr {/>}). (8) 

3 

This type of optimisation falls in the special class of convex optimisation problems [22|, [23| . While they still scale 
exponentially in the system size, such problems are nevertheless computationally efficient for a given system size and 
recent efforts combining convex optimisation with compressive sensing techniques have made substantial progress 
towards minimising both the experimental and numerical resources required for solving them [24l - l27j . Overall, the 
maximum-likelihood quantum state estimation optimisation problem can be summarised as: 

minimise IL(j\f,p), 
subject to p > & Tr{/>} = 1, (9) 

where I have assumed that p = p* has been enforced by choosing all Oj to be Hermitian. 

This optimisation can now be implemented in different ways (e.g., Refs [Tol [Til l23j ) . For this work, I now follow 
the method in Ref. [TP ], which assumes that the counts are sampled from Gaussian distributions with variances, 
<tJ , equal to the means of the distributions (as is true for the Poissonian statistics common in photonic tomography 
experiments). In this case, the log-likelihood penalty function reduces to a weighted least squares form: 

= £ - ■ do) 

j o-j(p, AO 2 j n 3 (p,Af) 

where rij(p,j\f) = AfTr{Pjp} and I have again explicitly included the normalisation parameter. (Note that the 
more standard, linear weighted least squares problem would involve fixed variances which do not depend on the 

optimisation parameters.) This can now be recast as follows as a particular form of convex optimisation called a 

semidefinite programme [T^, [22|, [28]] . Such problems are the focus of a large body of work and many good numerical 
routines are readily and freely available to solve them. 

To do this, I follow the approach of Ref. (28| (see Ref. [T]|), which has two key steps. The first is to replace the 
nonlinear objective or penalty function with a linear objective function and a nonlinear constraint: 

minimise y, 

subject to y — tfj/rij > 0, 

3 

p>0 & Tr{p} = l, (11) 

where 5j = S(jV, p,Af) = Afj — rij(p,jV) are the residuals between measured and expected counts and y is a so-called 
slack variable, which will achieve a minimum value for the same density matrix as the negative log-likelihood function. 
Secondly, the nonlinear constraint is then replaced by an equivalent larger-dimensional linear matrix constraint to 
give the final optimisation problem: 
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p>0 & Tr{p} = l, (12) 
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where n is the diagonal matrix defined according to (n)- = rij(p, AT). This is now written explicitly in the form of a 
semidefinite programme. 

Finally, if the normalisation parameter is unknown, it can again be calculated as part of the reconstruction process 
by simply removing the third constraint that Tr{p} = 1 and redefining n,j(p,Af) = AfTr {Pjp} = Tr{Pj/5}, where p 
is an unnormalised "density matrix" variable. The optimisation will now output an unnormaliscd estimate, g = Afg, 
where the trace is again the unknown normalisation parameter and the normalised part is the expected maximum- 
likelihood estimate. 

Throughout this paper, unless otherwise stated, I will focus on the context of photonic tomography, although 
the main concepts should generalise straightforwardly. For concreteness, I will assume that the normalisation is an 
unknown parameter which is calculated as part of the reconstruction process, i.e., the output is the unnormaliscd 
density matrix, g. I will also assume that data arc collected using successive, single-projector measurements. Therefore, 
considering only fundamental noise sources, for a given state, p, the measured counts Afj are Poissonian-distributed 
random variables with mean, (Afj) = Tij(p) = Tr{Pjf>}, and variance, o~ 2 (p) = ((Afj — (Afj)) 2 )=rij. 



In the absence of technical noise, given a set of measured data used to estimate a quantum state, we would expect 
the data to always agree with the predicted counts (as calculated from the reconstructed state) to within limits defined 
by fundamental statistical fluctuations. We can therefore diagnose the presence of technical noise if we can show with 
some confidence that this is not the case. This is exactly the scenario that is described by a standard chi-squared 
test. We have a set of observed frequencies, {Afj}, and a set of expected outcome frequencies, {rij(g)}, with standard 
deviations, Q" 2 ( p), and we can construct the standard chi-squared test statistic for measuring the "goodness of fit" as 



Note that X 2 has the same form as Eq. (fTTTj) . the negative log-likelihood penalty function, H(Af, g), in the Gaussian 
approximation, which is exactly the limit in which chi-squared-based tests are valid. Intuitively, this provides a 
motivation for the approach. We would like to answer the question: "How well does the reconstructed density matrix 
fit the measured data?" This is exactly what the likelihood function itself seeks to answer. Other approaches, like 
likelihood-ratio tests, build more directly on this intuition. 

Under appropriate conditions, the chi-squared test statistic, X 2 , is distributed according to a x 2 distribution with 
K = M — c degrees of freedom (DOFs), where c is the number of constraints imposed by the measurement scheme 
and the fitting of density-matrix parameters during the reconstruction process. The mean and variance of the x 2 
distribution are k and 2k, respectively (l7j . The same results can be obtained directly from Eq. (|13|) if the residuals in 
the test statistic are Gaussian random variables. For certain states, however, there may be individual measurements 
with very low counts, even if many system copies are measured overall. For the case of Poissonian photon statistics, 
this introduces a slight correction to the variance, giving instead a value of 2k + 1/rij flU l29j. 

In order to ensure validity of the chi-squared test, standard practice recommends that the expected counts for any 
given measurement should be large enough [Tt], [H|, normally greater than 5 or 10, or at least that that is true for 
the large majority of measurements. If that condition is violated (which would be rare for the less stringent condition 
in practical situations), it is allowable to group data from different measurements into bins with sufficiently large 
total expected counts. For systems where the measured counts arise from a well-defined number of total trials, such 
as ion-trapping experiments, the measurements would be described by multinomial statistics, making the Poissonian 
statistics used in the above definition only an approximation. Here, as well as ensuring that there are enough expected 
counts in each bin, it is also important to ensure that no count represents too large a fraction of the results for a given 
setting. However, since one count in each measurement setting in this context is always dependent on the others via 
the constraint on the total number of trials, it is always possible to choose to eliminate the large count (there can be 
only one) to ensure the Poissonian approximation remains valid [3(| . 

Both mean and standard deviation values depend on the measurements and constraints for the system and therefore 
vary with system size and measurement recipe. For everyday use, it is therefore generally more intuitive to use the 
reduced chi-squared test statistic, which I will often call simply the reconstruction quality for compactness: 



B. Reconstruction quality: the chi-squared "goodness of fit 



follows 0,13: 




(13) 



Q(g)=X 2 (g) = -X 2 (g), 



(14) 
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which physically describes the mean-square residual count error (per degree of freedom) in the final reconstruction. 
Assuming, as before, that the residuals are Gaussian random variables, the mean and variance of Q are 1 and 2/k, 
respectively. 

For larger values of K, the quality parameter should generally be very close to 1 if the experimental noise is 
dominated by the known, fundamental statistical errors. If the value is instead a sufficient amount larger than 
expected, it indicates that the statistical noise is not sufficient to explain the noise observed in the raw data. More 
concretely, for a given reconstruction, one can calculate either the standard or reduced chi-squared test statistic and 
compare it with the expected value, or rather, with a cut-off value which determines a particular confidence level. If 
the measured value is above the cut-off, then this diagnoses the presence of unexpected noise at that given confidence 
level. While it is not necessarily obvious how to incorporate this information into the noise model used to make 
improved error estimates, it does provide a quantitative assessment of the fitting process itself and the quality of the 
measured data. 

Up to this point, the discussion has progressed using basic statistical tools, but when applying these techniques 
to quantum tomography, the real difficulty that arises is how to deal with the effects of physicality constraints, 
more specifically, the inequality constraints imposed by requiring the reconstruction to be positive. As positivity 
constraints come into play, the general effect is to confine the reconstruction onto a surface of reduced dimensionality 
at the boundary of physical state space, which tends to reduce the number of free parameters in the optimisation. 
For example, for a single qubit, positivity forces points outside the Bloch sphere onto the surface of the sphere, a 
two-dimensional surface within a three-dimensional space [Fig. [T] . But the number of constraints determines the entire 
shape of the test statistic's expected distribution via k, the DOFs. For example, while an arbitrary d-dimensional 
density matrix (unnormalised) contains d 2 independent parameters (d 2 — 1, if normalised), an arbitrary d-dimensional 
pure state can be specified by 2d— 1 real parameters (again, unnormalised). Since the pure states lie at the boundary of 
the physical state space, this is where these effects are most likely to manifest. Unfortunately, in quantum information 
processing applications, the goal is often to produce states which are as close as possible to pure states (or unitary 
operations in the case of quantum processes) . Such experiments are therefore almost always aiming to probe the region 
where these effects will be seen, especially as they become ever more reliable and the need for precise quantitative 
analysis becomes ever more critical. Furthermore, in cases where measurement noise is large enough for physicality 
constraints to play a role, it is known that maximum likelihood tomography has a tendency to produce states that 
contain one or more zero eigenvalues, e.g., in one-qubit experiments, to produce pure states. In the remainder of this 
paper, I will study this issue and how to resolve it. 

C. Performance of the X 2 test statistic 

To test the usefulness of the reconstruction quality defined above, it is first necessary to check whether X 2 does 
conform to a \ 2 distribution in typical experimental scenarios. Figure [2] shows examples of a range of numerical 
tomography simulations performed with different system sizes and measurement sets. Each case considers 1000 
random Werner-like mixed states of the form p t = p\ip){il)\ + (1 —p)Id/d, where \ip) is a arbitrary pure state randomly 
chosen from the Haar measure and p was randomly chosen to lie between 1/3 and 2/3. The six plots show histograms 
of the measured X 2 test statistic distributions for typical examples, exhibiting good agreement with the probability 
density function for a \ 2 distribution with the M — d 2 DOFs expected given d 2 fitting parameters for an arbitrary 
density matrix (also plotted). The means and standard deviations of the X 2 distributions for all systems studied 
(d = 2 to d = 8 and M = 6 to M = 216) are plotted in Fig. [3] against the number of independent DOFs, M — d 2 . 
The lines show the expected mean and standard deviation of M — d 2 and iJ2(M — d 2 ), respectively. Throughout 
this paper, I will consider example measurement sets based on the platonic solids for qubit-based systems |3lj . given 
by: a tetrahedron (4 measurements, minimally complete), a cube (6 measurements, over-complete), an octahedron 
(8 measurements), a dodecahedron (12 measurements), and an icosahedron (20 measurements). For qudit-based 
systems, I will use an overcomplete set which is a generalised form of the cube set for qubits, and which includes one 
measurement for each of the computational basis states and four measurements to probe each of the density matrix 
coherences (two real and two imaginary two-state superpositions with opposite signs) [32j . 

The above simulations were all carried out using randomly chosen mixed states which always resulted in recon- 
structed density matrices with full rank (d eigenvalues), where the number of DOFs can be expected to match the 
value predicted using the full d 2 free parameters of an arbitrary mixed state as constraints. Figure [U on the other 
hand, shows the results of numerically simulated tomographies of randomly chosen Werner states (mixtures of max- 
imally entangled states with the maximally mixed state) as a function of the population of the pure, maximally 
entangled component of the state. These random states were used to generate noisy data for different measurement 
sets based on tensor products of singlc-qubit, platonic-solid measurements, but with the same total tomography time 
for each (i.e., correspondingly less time per setting for the larger sets). Figure0|a) demonstrates again that, for higher 
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FIG. 2: Comparison of measured distributions of the unnormalised X 2 test statistic from numerical tomographies with \ 2 
distributions with M — d DOFs (red curves). Simulations were performed using randomly chosen Werner-like mixed states of 
the form pt = p\tp){ip\ + (1 — p)Id/d, where is an arbitrary pure state randomly chosen from the Haar measure and p was 
randomly chosen to lie between 1/3 and 2/3. The example systems shown correspond to: (a-c) single-qubit states with 6, 8 and 
20 measurements, respectively; single-qudit states with (d) d = 3 and M = 15 and (e) d = 7 and M = 91; and (f) two-qubit 
(d = 4) states with 12 measurements per qubit (M = 144). 
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FIG. 3: Mean (red) and standard deviation (blue) of the measured X 2 distribution for numerical tomography simulations 
performed using randomly chosen mixed states (see also Fig. [2]), plotted as a function of the number of DOFs in the measured 
data, given an arbitrary full-rank mixed state. The lines show the expected mean, M — d 2 , and standard deviation, *J%M — d 2 ), 
for the appropriate \ 2 distribution. The circles (o) show single-qubit simulations with 6, 8, 12 and 20 measurement settings; 
the diamonds (0) show two-qubit simulations with 6, 8 and 12 measurements per qubit (M — 36, M — 64 and M = 144); the 
triangle (A) shows three-qubit simulations with 6 measurements per qubit (M = 216); and the squares (□) show single-qudit 
simulations for d — 3, 5, 6 and 7, with M — 15, 45, 66 and 91, respectively. The error bars are all smaller than the size of the 
plotted points and the data agrees well with the theoretical prediction within reasonable margins set by those errors. All of 
the simulations used 1000 randomly chosen mixed states, except for the three-qubit system, using 200 states. 



levels of mixture, the observed distribution of X 2 agrees well with the corresponding % 2 distribution with M — d 2 
DOFs. For higher purities, however, the means of the observed distributions begin to deviate substantially above the 
expected values, suggestive of a decrease in the number of constraints imposed by the fit of free parameters in the 
density matrix. This agrees at least qualitatively with what would be expected for higher purities, given that a pure 
state contains substantially fewer free parameters than a full-rank mixed state. It is further supported by Fig. Hljb), 
which shows that the increase in X 2 follows the decrease in fraction of reconstructed states of full rank (containing 
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FIG. 4: Simulations illustrating the effect of mixture on X 2 in two-qubit tomography, (a-c) Werner states [33|] of the form 
pt = p\ip)(ip\ + (1 — p)/4/4 (J4 is the identity operator in four dimensions and is a maximally entangled state) studied as 
a function of p. For each value of p, randomly chosen states are used to generate Poissonian-distributed noisy data sets for 
different platonic solid measurement sets [3l[: tetrahedron (minimally complete, red: — o — ), cube (over-complete, blue: — — 
), octahedron (green: J ), and dodecahedron (magenta: x ); which are then reconstructed using maximum likelihood 
estimation. Each point represents an average over 500 random states, except in the four-measurement set data, which averages 
over 7500 states per point. The graphs show (a) the average chi-squared test statistic for the noisy reconstructed states (the 
much-larger- valued twelve-measurement set exhibits the same trend, but is omitted in order to show better detail with the other 
sets); and (b) the fraction of reconstructed states which are full rank; and (c) the average fidelity of the noisy reconstructed 
states with the known target state. In (a), the solid lines show the expected value of the test statistic, assuming M - d 2 DOFs 
and the dashed lines show the expected position of the first standard deviation higher. The dotted lines show the width of the 
observed distributions (because they are skewed, the lines show the x 2 -function positions which correspond to the integrated 
probabilities for the ±er positions of a symmetric Gaussian distribution), (d-e) 150,000 randomly chosen, mixed two-qubit state 
reconstructions (6 measurements per qubit) with different numbers of nonzero eigenvalues, (d) Comparison of the observed 
distribution of X 2 with the theoretical x 2 probability distribution for M — d 2 = 20 DOFs (red curve) and an optimal x 2 
distributions with a fit to the number of DOFs over arbitrary real numbers (green curve: M—c ~ 23.97). Vertical lines show 
the expected means and standard deviation positions, (e) X 2 vs purity (Tr {p 2 }) of the reconstructed states. Dotted lines at 
Tr {f? 2 } = 1/4, 1/3 and 1/2 show the minimal purities for states with 4, 3 and 2 nonzero eigenvalues, respectively. 



no nonzero eigenvalues). 

But how significant is this discrepancy? Figure HJa) shows that, for the cube setting (six measurements per qubit), 
the observed mean moves almost one standard deviation higher than the expected value based on d 2 constraints. 
The significance of such a shift can be seen by considering how the chi-squared test statistic is designed to be used. 
Ultimately, we wish to be able to detect the presence of unexpected noise in the raw tomography data, to determine 
whether the reconstruction is in fact reliable. Assuming that the increase in mean results from a straightforward 
increase in DOFs, a naive calculation shows that a shift of one standard deviation would result in around 23% of 
excess-noise-free data sets being diagnosed as excess-noisc-affected at the 95% confidence level, instead of the 5% 
expected from the confidence definition. In reality, this assumption is too simplistic, as Figs |3Jd) and (c) explore in 
more detail. 

Figure 0Jd) shows the observed distribution of the chi-squared test statistic, X 2 , for 150,000 randomly chosen, 
mixed target two-qubit states. The target mixed states were chosen by first generating a random number of random 
eigenvalues in a way which biased the states towards the boundaries of the physical state space, to ensure that there 
were reconstructed states of every rank, and then assigning to each nonzero eigenvalue a pure eigenvector uniformly 
distributed according to the Haar measure (made orthogonal via Gram-Schmidt reduction). Figure Etc) provides a 
qualitative verification that this method docs produce mixed states throughout the state space. These states were 
then used to generate noisy measurement sets (with Poissonian noise) with 6 measurements per qubit (M = 36). The 
red curve shows the x 2 distribution for M—d 2 = 20 DOFs and the green curve shows the optimal x 2 distribution found 
by fitting the number of independent parameters (M—c ~ 23.97). Using, somewhat self-referentially, a standard chi- 
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squared test to determine whether the observed histogram could be plausibly obtained from either of the theoretical 
X 2 distributions, the calculated p- value was in both cases (to within machine precision). Thus, unlike the earlier 
example of full-rank mixed states illustrated in Figs [2] and El the test statistic for an arbitrary matrix cannot simply 
be characterised in terms of a \ 2 distribution with a single well-defined number of free parameters. In fact, it is 
known that hypothesis testing statistics do not generally, in the presence of inequality constraints, asymptote towards 
a simple x 2 distribution, but instead often towards a mixture of % 2 distributions with different DOFs, known as a 
chi-bar-squared (x 2 ) distribution [34l - |36l ]. This can be seen qualitatively from Fig. Hte), where several distributions 
in X 2 with different means and widths are clearly visible at different purities. 

Applying the same reasoning as above, if these results were analysed using the assumption of d 2 free parameters in 
the density matrix, around 16% (4.8%) of the reconstructed states would be diagnosed as affected by excess noise at 
the 95% (99%) confidence level, although the raw counts do only contain known statistical noise. Using this approach 
would therefore substantially overestimate the presence of noise when trying to diagnose systematic errors, severely 
undermining the validity of the chi-squared test statistic as a quantitative diagnostic in quantum tomography. Similar 
results would also be observed when using alternative statistical tests, such as the likelihood-ratio tests proposed in 
Ref . [l6[ , which are known to result in x 2 limiting distributions |35|, H3] ■ 

Returning briefly to Fig. IHa), note that the octahedron setting (eight measurements per qubit) exhibits a smaller 
discrepancy by comparison with the standard deviation of the expected x 2 distribution (2k). This results from the fact 
that the maximum range of DOFs is determined only by the range of constraints (2d — 1 to d 2 ), producing a smaller 
relative discrepancy for larger measurement sets. Unfortunately, it still scales badly as a function of system size for 
multiqudit systems. A rough calculation for larger numbers of qudits shows that this effect becomes exponentially 
more pronounced with increasing system size unless the number of measurements per qudit is greater than c? 4 , where 
in this case d is the singlc-qudit dimensionality (i.e. > 16 for qubits). Note that this holds for both photonic systems 
as well as atomic-like systems with near-deterministic measurements. 

It is therefore critical to develop a method for accurately dealing with the reduced number of independent DOFs 
in tomographic reconstructions near the physical state boundary if it is going to be possible to diagnose systematic 
noise in a rigorous quantitative way. 

1. Application example: a note on measurement sets and completeness 

Despite the quantitative discrepancy in the observed value of the chi-squarcd test statistic at high purities, it can 
already provide useful qualitative information. 

Figure [4jc) shows the average target-state fidelity, demonstrating that the over-complete sets perform somewhat 
better than the minimally complete, four-measurement set at larger purities. This supports the conclusions of Ref. [3l[ 
that, despite a larger number of measurements and concomitantly shorter integration times per measurement, over- 
complete measurement sets offer advantages over even optimal, minimal measurement sets [38| . Indeed, the minimally 
complete measurement set only seems to perform comparably well to the over-complete sets at higher levels of 
mixture where, particularly for these Werner states which smoothly increase in symmetry as the mixture increases, 
the measured data contains almost no information, in the sense that all counts should be approximately equal, 
irrespective of what measurement is being made. By contrast, there are substantial improvements for over-complete 
measurements for states near the boundary of physical states, which is generally the region of most interest for QIP 
contexts. This can be understood fairly easily by looking at the X 2 test statistic. 

Figure Sta) shows that, provided the state is sufficiently far from the state-space boundary relative to the measure- 
ment noise, the minimally complete tomography produces a perfect fit to the data every time, even when the data is 
known to contain noise. Except near the pure-state boundary where the physicality constraints can play a role, the 
minimally complete measurement set is completely unable to distinguish between quantum state and noise, and the 
tomography faithfully reconstructs the noise instead of the underlying state. The over-complete measurement set, 
however, contains redundant information which allows the tomographic optimisation to diagnose and reject some of 
the noise present in the data. While inevitably resulting in some mismatch between reconstruction and data (X 2 ^0), 
this pulls the reconstruction towards the underlying state, resulting in a reconstruction which is at least as accurate 
as the minimal tomography and generally more so — even though the integration time at each setting is substantially 
less (less than half in this example). Furthermore, it is only by taking over-complete measurements that it is even 
conceivable to use the data to check the validity of the underlying model. But this should not really be surprising. 
When asking students in an undergraduate laboratory to measure a quantity related to the gradient of a function, 
we never ask them to take two data points and draw a line between them. We expect them to measure a series of 
data points and use a numerical fit to all the data and our expectations should probably be no different for quantum 
tomography. 

Essentially, there are two guiding principles for choosing tomographic measurement sets: 1) using over-complete 
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measurement sets with redundancy of information will generally perform better than minimally complete sets, provided 
they are equivalent ly symmetric; and 2) the accuracy of tomography will increase as coverage of the Hilbert space 
increases, but for multiple qubits, this would generally require the use of entangled measurements which may therefore 
introduce other challenges that could counteract potential improvements. 

It is worth noting that determining exactly what constitutes a minimally complete set of measurements can be 
nontrivial, requiring a detailed understanding of the experimental specifics of the measurement apparatus. For 
example, for the photonic systems described above, where each measurement setting probes a single projector, four 
settings per qubit constitutes a minimal set (the final parameter is required for normalisation purposes, so the minimal 
set has d q elements, where q is the number of qubits, not the usual d q ~ 1). In ion-trapping tomography, however, 
where the measurements are essentially deterministic and the number of system copies used is therefore fixed and 
well-defined, it is typical to use three measurement settings per qubit. For a single qubit, this gives six measured 
probabilities, but still only three independent parameters. For two qubits, however, the four probabilities for each 
setting include three independent parameters, corresponding to 27 independent parameters for all 9 settings, and 
this already provides a substantial amount of measurement redundancy [16| • So although the standard three settings 
provide a minimally complete set for one qubit, they produce over-complete sets for two or more qubits. 

Interestingly, this observation explains trends illustrated in Ref. [3lT | as well as Fig. 21 which justify the guiding prin- 
ciples described above. One-qubit simulations show that using more measurements generally improves tomographic 
accuracy, resulting in some qualitative sense from greater coverage of the underlying Hilbert space. By contrast, 
when using separable measurement sets constructed from tensor products of single-qubit measurements, simulations 
with two or more qubits generally show some improvement between a minimally complete set and an over-complete 
set, but that little further advantage is gained by increasing the number of single-qubit measurements beyond what 
gives rise to an over-complete set. For example, unlike the one-qubit case, in the atomic-like systems, the standard 
cube-based measurement is already near-optimal for two or more qubits [3l| . This saturation of further improvement 
can be understood by noting that separable tensor-product projectors are a measure-zero subset of the space of all 
pure measurements [331. That is, while it is possible for a set of separable measurements to span a multiqubit Hilbert 
space, provided the measurement set is over-complete, adding more separable measurements is unlikely to substan- 
tively increase the coverage of the encompassing space, which almost entirely consists of entangled states. Thus, 
for multiple qubits, further significant improvements beyond the cube-based tomographic sets are only likely to be 
achieved using at least some entangled states (see, e.g., Ref. [40|). 

D. Calculating degrees of freedom and the reduced reconstruction quality 

Figure U provides an indication of the complexities that arise when trying to understand the effects which inequality 
physicality constraints impose on the statistics of goodness of fit in tomographic reconstructions, especially as a 
function of important experimental variables such as state purity. As illustrated in Fig.[4je), such inequality constraints 
can produce complex distributions incorporating several, distinctly different underlying distributions. Determi ning 
the precise form of this expected distribution in advance is known to be a complex or even intractable problem [ill |42| 
and, at this stage, more investigation is still required to find the ideal solution for the context of tomography. In 
this section, however, I suggest a simple, heuristic way to try and compensate for the different conditions imposed 
by boundary conditions on different tomographic reconstructions. The intuition motivating this approach builds on 
the fact that when the physicality constraints come into play, they tend to result in reconstructions with some zero 
eigenvalues, thus reducing the number of free parameters in the reconstructed density matrix [see, e.g., Fig. Ufa)] 
and in turn also the number of constraints imposed by optimisation. One might therefore expect that tomographic 
reconstructions which are more strongly affected by the physicality conditions would tend to arise from distributions 
characterised by more DOFs. 

The most obvious way to work out the number of constraints c introduced by tomographic optimisation is to count 
the number of independent parameters in an arbitrary density matrix with I nonzero eigenvalues (1 < I < d). We will 
assume that the matrix is unnormaliscd to allow the calculation of an unknown normalisation parameter during the 
tomographic reconstruction, but this also makes the calculation easier. The first eigenvector is an unknown, arbitrary 
d-dimcnsional pure state, so it is described by 2d — 1 independent parameters. The next eigenvector, however, must be 
orthogonal to the first and therefore lies in the (d— l)-dimensional subspace which remains when the first eigenvector 
is removed from the overall state space. It therefore adds a further 2{d— 1) — 1 independent parameters to the density 
matrix. Continuing in this manner until the right number of eigenvectors have been added, it is straightforward to 
calculate the total number of independent parameters, and therefore also the number of constraints to be: 



i-i 

c = ^[2(d-0~l] =l(2d-l), 



(15) 
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FIG. 5: Counting the number of nonzero eigenvalues. Histograms of values of the smallest eigenvalue of the reconstructed 
states, showing the (a) positive and (b) negative values. Red circles indicate bins containing zero samples. The grey, shaded 
area shows the region counted as zero for all the simulations carried out in this paper. 



which gives c = Id — 1 for pure states and c = d 2 for full-rank mixed states, as required. 

The number of eigenvalues in the reconstructed states can be counted by identifying a cut-off and counting as 
nonzero only those eigenvalues which are greater than that value. Figure EJb-i) shows a histogram of the smallest 
eigenvalues for reconstructed two-qubit states from the same simulation described in Fig. [5Jd— e). Histograms for 
all systems studied and for all eigenvalues (except the largest, which must by definition be larger than 1/d) show a 
similar trend, namely some distribution of nonzero eigenvalues at larger values which tends to zero by around 10~ 6 
and another large lobe of the distribution below a few 10~ 7 which can be identified as "falsely" nonzero due to some 
form of imprecision noise. This is confirmed by the matching lobe in the histogram showing the elements of the 
distribution with negative values in Fig.[S[b-ii). This threshold, which is far above machine precision, may be a result 
of and set by tolerances in the reconstruction process itself, which is otherwise supposed to ensure that eigenvalues 
are all positive. 

Figures |6ja) and (b) show the mean and standard deviation of the X 2 test statistic distributions obtained by 
grouping the reconstructed states by number of nonzero eigenvalues, plotted against the number of independent 
variables defined by Eq. ([T3|) . The results show generally good agreement with the theoretical curves which indicate 
the values expected for a % 2 distribution with the appropriate number of parameters, in particular for all those 
cases corresponding to either pure states or full-rank mixed states. In the two-qubit tomography, however, the cases 
corresponding to 2 or 3 nonzero eigenvalues, while in the right region, do not provide such good agreement. To illustrate 
this in more detail, Fig. &c) shows the observed distribution for the two-qubit simulations with six measurements 
per qubit (M = 36). There is excellent agreement of the optimal (green) and expected (red) theoretical distributions 
for Figs|6jc-i) and (c-iv), but a visible disparity in Figs |6fc-ii) and (c-iii). This trend was relatively consistent for all 
systems studied. 

These observed discrepancies are most pronounced when considering minimally complete tomographic reconstruc- 
tions, such as those based on the tetrahedron single-qubit measurement set. When the state is sufficiently far from 
the state-space boundary, there are enough free parameters in the resulting full-rank reconstruction to allow a perfect 
fit to the measured data, and the measured test statistic is always zero. Near the boundary, however, when the 
reconstruction has fewer nonzero eigenvalues, the physicality constraints prevent a perfect fit to the data and the 
test statistic becomes again nonzero. In this case, because the full-rank reconstructions have no DOFs, this in some 
sense isolates these discrepancies from competing factors and the change in behaviour is much more pronounced. 
Studying this may therefore provide key insight into the general problem. Nevertheless, evidence from other perfor- 
mance measures (e.g., 3l|) strongly suggests that experimentalists should in any case avoid using minimally complete 
measurement sets for tomography. Fortunately, in other cases, the observed discrepancies are generally substantially 
smaller than the width of the X 2 distributions and therefore do not greatly limit the usefulness of the chi-squared 
quality parameters for providing quantitative validation of experimental tomography results, as discussed in the next 
section. 

Finally, for a given set of experimental data, the reconstruction quality is calculated using Eq. (|14[) with the number 
of constraints c calculated by counting the number of nonzero eigenvalues in the reconstructed density matrix and 
applying Eq. ([T3|) . Figure[7ta) shows the resulting distribution for the two-qubit, 36-measurement case discussed above. 
Because of the observed discrepancy between the predicted and optimal distributions, 7.81 ± 0.07% (1.79 ± 0.03%) 
of simulated reconstructions had a reconstruction quality above the 95% (99%) confidence level threshold, instead of 
the 5% (1%) expected for data containing no excess noise. This is nevertheless a substantial improvement over the 
naive reconstruction quality calculated assuming d 2 constraints. 

At this point, there is a related approach to analysing the reconstruction quality which is explicitly motivated 
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FIG. 6: Numerical simulations illustrating the dependence of the X 2 test statistic's distribution on the rank of the reconstructed 
density matrix. Measured means (a) and standard deviations (b) of X 2 distributions for a range of systems (with different 
dimensions and measurement set sizes) with the reconstructed states grouped by the number of nonzero eigenvalues, plotted 
against the expected number of independent random variables, defined by Eq. (|15|) . Solid curves show the expected values of 
M—c and y/M—c. The circles (o) show single-qubit simulations with 6, 8, 12 and 20 measurement settings; the other points 
show two-qubit simulations with 6, 8 and 12 measurements per qubit (M — 36, M = 64 and M — 144) for the pure-state and 
full-rank reconstructions (diamonds: 0) and 2 and 3 nonzero eigenvalues (squares: □). (c) Distribution of the test statistic 
for 150000 randomly chosen, mixed two-qubit state reconstructions with 6 measurements per qubit, grouped by the number of 
nonzero eigenvalues: (i) one, (ii) two, (iii) three and (iv) four (full-rank) nonzero eigenvalues. Red curves show x 2 probability 
distributions for the expected number of independent parameters and green curves show the optimal \ 2 distributions with a 
fit to the number of DOFs (over arbitrary real numbers). 
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FIG. 7: Reduced chi-squared quality parameter distributions with the number of constraints: (a) determined by counting the 
nonzero eigenvalues in the reconstructed states; and (b) calculated as an average estimated using Monte-Carlo simulations. 

by the assumption that the underlying statistics arc governed by a general x 2 distribution. The x 2 distribution is 
characterised by its so-called survival probability function 

P( X 2 >X 2 ) = Y,^P(X 2 >X 2 ), (16) 

3 

where Wj are the weights which define the shape of the underlying x 2 distribution. Knowledge of Wj would therefore 
immediately allow the calculation of a p-value which characterises the reconstruction quality. However, since deter- 
mining wj analytically is known to be difficult (4ll . |42| , the standard approach is to estimate them using a standard 
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FIG. 8: (a) The effect of a laboratory air-conditioner on tomographic counts. The counts show experimentally observed 
"normalisations" for a two-qubit state, determined by adding the total counts measured in four successive measurements of the 
orthogonal states in a given basis — there are 9 such sets in each two-qubit tomography made using 6 measurements per qubit. 
The fluctuations are far larger than the expected Poissonian noise (~ 50 counts) and clear systematic cycling is apparent, 
(b) The total probability of diagnosing the presence of additional noise versus the relative drift-noise ratio for 95% (o) and 
99% (□) confidence using the reconstruction quality, Q(q), from Eq. (|14[) . with the number of constraints determined from 
Eq. (|15[) either: (red) by simply counting the number of nonzero eigenvalues; or (blue) via Monte-Carlo simulation to calculate 
an average number for each set of data. 

Monte-Carlo technique [36|, [42J. Specifically, either the raw data or the reconstructed state can be used to generate 
many samples of noisy data, given some particular noise model such as the Poissonian statistics assumed in these 
simulations. Each resulting tomography then gives a density matrix and the number of nonzero eigenvalues can be 
counted as described above. The distribution of these outcomes is then used to estimate the weights, Wj, and a p- value 
can be calculated from Eq. (Tpp)) . with DOFs for the Xj distributions determined from Eq. (|T5|) . From Eq. ([TB|) . it is 
easy to show that the mean of the underlying distribution is given by (4lj 

« = (X 2 ) = ^Wj{Xj) = Yl w i K i- ( 17 ) 

o j 

It is therefore still useful as an aid to intuition to define a normalised reconstruction quality using R in Eq. (|14[) . 
Figure \7\b) shows the resulting distribution for 1000 random two-qubit states (with M = 36). This method is 
obviously more computationally intensive, since it relies on repeated Monte-Carlo simulations to calculate the number 
of DOFs, but since this is often required already to determine error bars for derived physical quantities, this may not 
be a substantial further inconvenience. The performance of these two techniques will be compared in the next section. 

E. Applications: diagnosing systematic errors in experimental tomography 

Ultimately, the main goal of calculating a goodncss-of-fit statistic like the reconstruction quality is to be able 
to detect and diagnose whether the observed results really can be explained by the proposed noise model used to 
calculate the tomographic errors. As discussed already, in real-world experiments, there may be many sources of 
errors other than those introduced by the statistics of finite counting. For example, for nondeterministic quantum 
sources, such as photon sources, there may be systematic drift in the brightness of the source, either from the source 
interaction itself or from simple alignment drift, especially since tomography often requires long counting times. 
Alternatively, there may be systematic errors in the measurement settings, as is quite common in ion-trapping and 
superconducting systems [l6|, or unknown setting-related distortion in detector efficiencies, such as might occur in 
an integrated photonic detector which has polarisation-dependent sensitivity [43||. Or there may even be some form 
of extra random statistical noise which does not originate from the expected effects of finite counting times, such 
as squeezed states exhibiting super-Poissonian counting statistics |44j |. experiments operating in a regime where the 
Poissonian approximation breaks down, or statistical fluctuations in critical operating parameters like magnetic flux 
in superconducting circuits. Furthermore, these extra sources of error can occur over a range of time scales, from 
much smaller than the single-measurement count time to longer than the overall tomography time and the effect on 
the reconstruction can vary greatly as a result. 

To illustrate the usefulness of the reconstruction quality, in this section, I study the first example of systematic drift 
in source brightness, one which is particularly problematic in photonic experiments. Photonic experiments often use 
coupling to single-mode fibres or integrated chips to improve the visibility of quantum interference, but can become 
very sensitive to beam alignment as a result, even at the scale of temperature fluctuations induced by laboratory 




14 




reconstruction quality reconstruction quality reconstruction quality reconstruction quality 



(a) (b) (c) (d) 



FIG. 9: Distribution of the reconstruction quality for a source with systematic, sinusoidal fluctuations in source brightness 
with a relative drift-noise ratio of 1.0 and a period of 9.5 count settings. Distributions calculated from 10,000 randomly chosen, 
mixed two-qubit state reconstructions with 6 measurements per qubit, grouped by the number of nonzero eigenvalues: (a) one, 
(b) two, (c) three and (d) four (full-rank) nonzero eigenvalues. Red curves show \ 2 probability distributions for the expected 
number of independent parameters. Vertical lines show the expected mean (solid), the first few standard deviations on either 
side of the mean (dotted) and the cut-off values for diagnosing the presence of unexpected noise with 95% and 99% confidence. 



air-conditioning systems. Often it is possible to ensure experimentally that the detailed conditions of the experiment 
are unchanged as a result (e.g., the source always produces the same state), but it is sometimes harder to eliminate 
drift in overall coupling efficiency and it is not uncommon to observe peak-to-peak fluctuations in brightness of 20% 
or more [Fig. [S^a)]. 

Using the familiar case of random photonic, two-qubit mixed states and 6 measurements per qubit, a simulated 
systematic, sinusoidal fluctuation in source brightness is added to the "ideal" counts before the simulated Poissonian 
noise is finally added on top. The sensitivity of a tomography to such systematic fluctuations is determined by the 
amplitude of those fluctuations relative to the average size of the statistical noise. Therefore, to make the scale 
meaningful, the relative amplitude of the systematic fluctuations is set in proportion to average expected standard 
deviation of the statistical noise {1/\J~N) over all counts in the simulation. Figure |9] shows the normalised quality 
distributions for a relative drift-noise ratio of 1.0 (with a period of 9.5 count settings) for 10,000 random states with 
a mean normalisation of 2000 overall photons per measurement setting, separated according to number of nonzero 
eigenvalues, and with the vertical lines showing the expected mean of 1 and the first few standard deviations for 
the corresponding reduced x 2 distribution. Each case shows a shift which is clearly observable in the distributions 
of the quality parameter. For each reconstruction, the reconstruction quality is then compared against the cut-off 
set by the desired x 2 confidence interval for the appropriate number of independent variables. The total probability 
of diagnosing the presence of additional noise is plotted against the relative drift-noise ratio for two typical desired 
confidences of 95% and 99% in Fig. EJb) (red data). Thus, with this type of drift, using the fit quality would diagnose 
the presence of unknown noise with 95% confidence for more than 50% of tomographic reconstructions given a system 
that exhibits brightness fluctuations which are only around 60% the size of the statistical noise. Note that the overall 
diagnosis rates will depend somewhat on the specific noise parameters, such as its oscillation period. The diagnosis 
probabilities for zero noise are 0.075 ±0.003 and 0.016 ±0.001 for 95% and 99% confidence, respectively, indicative of a 
small, but observable effect from the discrepancies between expected and observed X 2 distributions for rank-deficient 
reconstructed states (discussed above). 

A similar numerical study was made using the alternative Monte-Carlo technique for assessing the reconstruction 
quality. Using the same system and conditions, for each of 1000 data sets, 100 samples were used to estimate the 
underlying \ 2 distribution and calculate the p-value which directly assessed the reconstruction quality. These results 
are also plotted in Fig. [8jb) (blue data) and show similar performance in general to the simpler technique. The 
diagnosis probabilities for zero noise are 0.063 ± 0.005 and 0.013 ± 0.002 for 95% and 99% confidence, respectively, 
again slightly more than the expected values of 5% and 1%. 

For both of these methods, the results show rates of over-diagnosis that are substantially smaller than those obtained 
based on making the assumption of d 2 constraints for all density matrices. This demonstrates that making a more 
careful assessment of free parameters in the reconstruction greatly improves the faithfulness and usefulness of the 
standard and reduced chi-squared parameters as measures for diagnosing noise. 



F. Conclusions and future directions 



In this paper, I have outlined an intuitive approach to assessing reconstruction quality based on standard chi-squared 
statistical techniques. I showed that the physicality constraints modify the number of free parameters available to 
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the optimisation process, making it more complicated to predict the characteristics of the expected underlying x 2 
distribution, such as its mean. Ignoring this and naively using the standard value of d 2 leads to strongly over- 
estimated noise detection, which negates the value of such measures to provide meaningful, quantitative technical 
noise diagnosis. To address this problem, I have taken a simple, heuristic approach to accounting for the effects of 
physicality constraints based on counting the number of free parameters in rank-deficient mixed states. Specifically, 
I have suggested two related ways of calculating a normalised test statistic and demonstrated that they perform 
substantially more reliably in typical scenarios than the naive approach, particularly in tomographic reconstructions 
based on over-complete measurement sets. One is very easy to calculate, while the other, though more computationally 
intensive, may provide a more reliable representation of the underlying \ 2 statistics. Numerical simulations showed 
that both measures are capable of diagnosing the presence of systematic fluctuations in source brightness, a typical 
problem in modern photonic experiments, even at noise levels lower than the ever-present statistical noise. Given the 
substantial difference in computational complexity of these two techniques, it would be interesting in the future to 
carry out a detailed comparison of their performance. It is also worth emphasising that the same approach should also 
offer benefits when applied to other forms of hypothesis tests, such as the generalised likelihood ratio test introduced 
in Ref. [l6| , where the same tendency towards over-diagnosis of noise would be expected [1^] . 

I suggest that such a measure of reconstruction quality should always be reported with tomographic results, since it 
is as important to interpreting the data as any Monte-Carlo estimates of errors in physical quantities derived from the 
density matrix. In fact, it could be argued that Monte-Carlo error estimation is meaningless without the supporting 
information provided by the reconstruction quality or an equivalent, to verify that the residual discrepancy between 
the observed data and the output state can indeed be plausibly explained by the noise model used to derive the error 
estimates. 

In future work, further investigation is necessary to explore the origin of the discrepancy between the simulation 
and expected distributions for rank-deficient matrices with more than one eigenvalue, and to study in more detail 
the dependence of this discrepancy on factors such as mixture and precise form of the target or reconstructed state. 
In particular, a promising path to pursue could be to explore this behaviour more fully in the context of minimally 
complete sets, where only the effects resulting from the physicality constraints imposed during reconstruction will be 
present. 

It would also be interesting to see how the key idea of assessing the quality of a reconstruction in some way might 
be generalised to the context of region-based or Bayesian quantum estimation (e.g., [H, [ID, HH). Finally, I would 
note that the good agreement between the expected theory and the simulations with mixed Werner-like states also 
show that the reconstruction quality may already be suitable for use with other forms of tomography which don't 
produce rank-deficient output matrices (e.g., fl2j). 

Ultimately, if quantum tomography is to fulfil its purpose of providing a rigorously quantitative way to characterise 
a quantum system, we need a comprehensive recipe for assessing the quality of tomographic reconstructions. This 
is critical, both in order to authenticate the reconstruction and any physical quantities derived from it, and also to 
provide a reliable and accepted way to diagnose the presence of noise not accounted for in the expected noise model. 
I have shown that a critical challenge in achieving this goal arises from the difficulty with identifying and quantifying 
the effects of physicality constraints in the reconstruction process. Developing a recipe which is both deductive and 
reliable in situations where the physicality constraints come into play represents a major open problem for the field. 
Moreover, if tomography is going to be genuinely useful as an everyday laboratory tool for trouble-shooting and 
benchmarking performance in quantum information processing experiments, it is also vital that the recipe provides a 
measure which is simple and fast to calculate. This work aims to highlight the importance of these requirements and 
to outline a partial solution which can be implemented immediately by experimentalists. 
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